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Abstract 

The computational costs of calculating the matrix sign function of the overlap op- 
erator together with fundamental numerical problems related to the discontinuity 
of the sign function in the kernel eigenvalues are the major obstacles towards simu- 
lations with dynamical overlap fermions using the Hybrid Monte Carlo algorithm. 
In a previous paper of the present series we introduced optimal numerical approx- 
imation of the sign function and have developed highly advanced preconditioning 
and relaxation techniques which speed up the the inversion of the overlap operator 
by nearly an order of magnitude. 

In this fourth paper of the series we construct an HMC algorithm for overlap 
fermions. We approximate the matrix sign function using the Zolotarev rational ap- 
proximation, treating the smallest eigenvalues of the Wilson operator exactly within 
the fermionic force. Based on this we derive the fermionic force for the overlap op- 
erator. We explicitly solve the problem of the Dirac delta-function terms arising 
through zero crossings of eigenvalues of the Wilson operator. The main advantage 
of this scheme is that its energy violations scale better than 0(At 2 ) and thus are 
comparable with the violations of the standard leapfrog algorithm over the course of 
a trajectory. We explicitly prove that our algorithm satisfies reversibility and area 
conservation. We present test results from our algorithm on 4 4 , 6 4 , and 8 4 lattices. 
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1 Introduction 



For more than two decades lattice QCD simulations with light dynamical 
quarks remained intractable as the chiral symmetry of the underlying QCD 
Lagrangian, which holds in the case of zero mass quarks, could not be embed- 
ded properly into flavour conserving fermion lattice discretization schemes. A 
standard workaround takes recourse to fairly heavy quarks instead and ex- 
trapolates the results over a wide range of quark masses to the very light 
quark mass regime. Unfortunately, simulating far beyond the realm of chiral 
perturbation theory such extrapolations carry systematic errors. These errors 
have to be avoided in order to achieve a high precision of phenomenological 
observables. 1 

During the 90 's the first chiral lattice Dirac operators were written down. 
While studying domain wall fermions [2], an early attempt to formulate chiral 
fermions on the lattice, Neuberger and Narayanan realised that the Nielsen- 
Ninomiya theorem [3] can be circumvented by placing an infinite number of 
fermions on the lattice. This insight led them to the overlap lattice Dirac 
operator [4,5]. Afterwards, Hasenfratz, while working with classically perfect 
fermions, realised that the small mass bottleneck could be overcome by using 
a discretization scheme that obeys a lattice variant of chiral symmetry [6], as 
expressed by the Ginsparg- Wilson relation for the quark propagator [7], which 
in turn implies a novel version of chiral symmetry on the lattice [8]. Neuberger 
showed that the overlap operator obeys the Ginsparg- Wilson relation. The- 
oretically, such a scheme induces a dramatic reduction in fluctuations in the 
smallest Dirac operator eigenvalue compared to naive Wilson fermions in the 
vicinity of zero quark mass. However, there is a numerical obstacle: the imple- 
mentation of the overlap operator requires the very frequent solution of linear 
systems involving the inverse matrix square root or, equivalently, the matrix 
sign function (of the Hermitian Wilson-Dirac operator Q). 

The problem of approximating the application of sign(Q) on a vector has 
been discussed in a number of papers, using polynomial approximations [9- 
13], Lanczos based methods [14-17] and multi-shift CG combined with a par- 
tial fraction expansion [18-21]. The Zolotarev partial fraction approximation 
(ZPFE), first applied to lattice QCD in [22] - the first paper in the present se- 
ries - was shown to be the optimal approximation to the matrix sign function. 
ZPFE has led to an improvement of over a factor of 3 compared to the Cheby- 
shev polynomial approach [11]. This technique to compute the sign function 
has meanwhile been established as the method of choice [23-25]. Moreover, 



Simulations with staggered fermions are less prone to fluctuations. However, it is 
yet unclear if staggered fermions have the correct continuum limit in case of their 
one-flavour approximation by the fourth root trick [1]. 
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it is the natural starting point for both the treatment of dynamical overlap 
fermions [26] and optimised domain wall fermions [27-29]. 2 

Until recently, simulations with overlap fermions were restricted to the quenched 
model where fermion loops are neglected [34-37], or hybrid calculations using 
staggered or Wilson sea quarks, because of two reasons: (i) the sheer costs of 
the evaluation of sign functions of matrices with extremely high dimensions 
and (ii) the lack of exact algorithms including dynamical overlap fermions. 

In this paper, we construct a Hybrid Monte Carlo algorithm [38] for QCD 
with overlap fermions. 3 Mostly, we just need to adapt the algorithm as ex- 
isting for example for Wilson fermions to the overlap operator. However, the 
discontinuity in the matrix sign function used in the overlap construction 
causes additional complications: as soon as one of the eigenvalues of Q crosses 
zero, the change in the sign of the eigenvalue leads to a discontinuity in the 
fermionic part of the action, which introduces the occurrence of a Dirac delta 
function in the HMC fermionic force, as first discussed in [26,43]. This effect 
is predominantly seen in strong coupling: eigenvalue crossings are rare in the 
weak coupling limit [45]. 

The major part of this paper will deal with the proper integration of this sin- 
gular force. Important new contributions in this paper are: (1) We calculate 
the energy violation at the crossing exactly; (2) We explicitly treat the im- 
plementation of the eigenvalue projection technique in the HMC algorithm; 
(3) Most importantly, we introduce a momentum update when there is an 
eigenvalue crossing with improved energy conservation properties, with the 
leading errors being of order 0(Ar 2 ). This is based on our generalized scheme 
to construct higher-order exact treatment of the eigenvalue crossing. We are 
going to demonstrate that this feature, along with the dependency of the small 
eigenvalue density leads to an HMC algorithm with computational complexity 
of 0(V 5 / 4 ). (4) We discuss what happens when two eigenvalues cross within 
the same molecular dynamics step. 4 

The main practical difficulty in running dynamical overlap simulations is the 
cost in computer time. Overlap fermions are at least 0(100) times more expen- 
sive to compute than Wilson fermions. In other papers of this series [46-48], 
we discussed how the inversion of the overlap operator, a very time consuming 

2 An alternative scheme is to use a 5-dimensional representation of the overlap oper- 
ator - see references [30-32] . An HMC algorithm using a polynomial approximation 
to the sign function was tested in [33], and found to be inferior to the ZPFE. 

3 Earlier work done concerning dynamical overlap fermions in the Schwinger model 
can be found in [39-42]. More recently, some exploratory studies in full QCD 
[26,43,44] have been presented. 

4 In a forthcoming paper we will present an advanced scheme for the treatment of 
two small eigenvalues. 
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part of the HMC algorithm, can be accelerated. Here we will make use of all 
these methods. We estimate the time required with our current algorithms for 
a full HMC simulation on moderate lattices at realistic masses. Early results 
were outlined in [49]. 

Section 2.1 gives a brief introduction to the Hybrid Monte Carlo method for 
generating dynamical configurations. Sections 2.2 to 4 outline our method for 
adapting HMC to the overlap operator. Section 5 gives first numerical results, 
demonstrating that the algorithm works in practice. After a brief conclusion, 
we include two appendices, proving that the proposed algorithm satisfies de- 
tailed balance and describe the additional implications of a combined trans- 
mission/reflection update within a normal leapfrog integration. Furthermore 
we describe the correction update when two eigenvalues cross zero during the 
same time step. 



2 Hybrid Monte Carlo 

2. 1 HMC for Wilson fermions 

In order to make the paper self-contained, we give a short review of the HMC 
algorithm for the case of Wilson fermions [38]. 

The standard Wilson Dirac operator on the lattice, with a mass — m , is 

M xy =l xy (1 - 7m) U A X ) 5 x+e»,y 



+ (1 + 7 M ) U\{x - eJSz-^y 

The Wilson operator is 75-Hermitian, implying that one can construct a Her- 
mitian Wilson operator 

Q = 75M. (2) 

The HMC method updates the gauge field in two steps: (1) a molecular dy- 
namics evolution of the gauge field and (2) a Metropolis step which renders 
the algorithm exact. In the molecular dynamics step, a momentum n is intro- 
duced, which is conjugate to the gauge fields U. Using a fictitious computer 
time, r, the momentum field is defined such that 

dU/dr = U = iUU. (3) 
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Since U G SU(iVc), II must be a Hermitian traceless matrix. Following the 
classical equations of motion of this system will allow to generate the correct 
ensemble. Using the Wilson gauge action, the total energy of this system is 



E(r) =/3£ 

X 

X w =Q- 2 $. 



1 - 



1 



2N, 



Tr(u(x), u + U(x)l)\ +Xl$ + ±J2™ 2 , (4) 

(5) 



U{x)^ v is called the plaquette, $ is a Gaussian random spinor field, and Nc 
is the number of colours (for QCD Nc = 3). The second equation of motion 
can be inferred from the condition 



E 





ETr 



^(x)U^x)V,(x) + Tl^x)F,(x) 



h.c. 



This leads to 



tl^x) = i 



^ Ulx (x)V,(x) + F,(x) 



h.c. 



(6) 



(7) 



Traceless 



Here V^x) is the staple, the sum over all the remaining parts of those plaque- 



ttes which contain the specific gauge link U^x). F^{x) is the fermionic force, 
which can be found by differentiating with respect to U^{x). For Wilson 

fermions, the fermionic force is given by 



F^x) = K 



MX W \ Xl(x)(l + 7m ) + X w (x + e M ) (mX w )\x)(1 - 



(8) 



These classical equations of motion have to be solved numerically. An im- 
portant requirement for the Markov process is to maintain detailed balance, 
in order to guarantee the generation of the correct canonical distribution. To 
maintain detailed balance, each molecular dynamics update from computer 
time r to computer time r + At needs to be both area conserving, i.e. the 
Jacobian for the update is 1, and reversible. 5 The leapfrog algorithm fulfils 
both these requirements, and, over an entire trajectory, conserves energy up to 
order At 2 . 6 For later convenience, we will write it in terms of a four-step pro- 
cedure updating the momentum fields and gauge fields in turn (see algorithm 



5 Starting with a configuration {[/(O), 11(0)}, we perform a molecular dynamics 
step to reach a configuration {U(At), II(At)}. If the update is reversible, carrying 
out the molecular dynamics step backwards, with At — > —At and II — > —II, will 
return to the original configuration. 

6 There are superior integrators to the simple leapfrog, such as the Omelyan integra- 
tor [50,51], and the Sexton- Weingarten Integrator [52], but the methods described 
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(1) n(r + Ar/2) = n(r) + Aril(r)/2. 

(2) C/(r + Ar/2) = e^Mr+A^)^^^ 

(3) U(T + Ar) = e i(Ar/2)n(r+Ar/2) [/ ( r + Ar / 2 )_ 

(4) n(r + At) = U(t + At/2) + AtI1(t + At) /2. 
Algorithm 1. The standard leapfrog update. 

1). The HMC update consists of n m d molecular dynamics steps followed by a 
Metropolis step, which corrects for the small violations in energy conservation 
due to the numerical integration. At the end of the trajectory, the configura- 
tion is either accepted or rejected, with the probability of acceptance being 
P acc = min(l,exp(E — E')). Here E is the initial energy at the start of the 
trajectory, and E' the final energy. To generate a sufficiently large number of 
new configurations, a high rate of acceptance is required. Thus, the molecular 
dynamics procedure has to conserve energy as well as possible. Any sizable 
violation of energy conservation will lower the acceptance rate, and thus the 
efficiency of the algorithm. 

2. 2 Naive HMC with overlap fermions 

The overlap operator is given by [18]: 

J D=(l + /i)+ 75 (l-/i)sign(g), (9) 

where /i is a mass term. The bare fermion mass is 

m b = 2//m /(l - fj), (10) 

where m is the quark mass of the Wilson operator D w = j^Q. 
The Hermitian overlap operator reads 

H = l5 D. (11) 

Thus, the pseudo-fermion action is given by S p f = (p^H~ 2 (p. 

2.2.1 Eigenvalues outside Zolotarev range. 

We approximate the matrix sign function using a rational approximation. 
It is advantageous while calculating the sign function to treat the smallest 
eigenvalues of Q explicitly in a spectral representation. If we project out the 
lowest n e eigenvectors of Q, then the rational fraction expansion, which is used 

here can easily be adapted to these integrators; indeed we are using the Omelyan 
integrator in our current production algorithm. 
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to approximate the sign function for the eigenvalues of Q 2 within the fixed 
range [a 2 ,/9 2 ], is modified to 

(n e \ n e 

+EI^X^k(A z ), (12) 
1=1 J 1=1 



a 2 Q 2 + Ck(a,(3Y y ' 

Here a = 1/a, are the eigenvectors of Q with eigenvalue A;, and e(A) 
denotes the sign function. We shall assume that u and (, the coefficients of the 
rational fraction, are known (here we used the Zolotarev coefficients [53,54]). 
During the course of the Hybrid Monte Carlo, we keep the coefficients a and 
(3 fixed to maintain the acceptance rate, which requires the projection of all 
eigenvectors of Q with eigenvalues |A| < a. 



2.2.2 Eigenvalues inside Zolotarev range. 

Of course, one is free to project out eigenvectors within the Zolotarev range 
as well, either exactly, as in (12), or from the rational approximation itself: 

(n p \ 
i- E h/Wl + 
l=rt+l J 

n p 

E I^^IE^r^- (i4) 

l=n e +l k a A l 

This accelerates the calculation of the multi-mass solver. Our preferred method 
that allows to simply fulfil the requirements for detailed balance, is to project 
out a fixed number, n p , of eigenvectors, treating the n e eigenvectors below a 
explicitly according to (12), and using (14) for any eigenvectors that lie within 
the range of the rational fraction approximation. 



2.2.3 Differentiating Eigenvectors. 

In order to calculate the fermionic force one needs to differentiate the sign 
function with respect to fictitious time r, which means that one must differ- 
entiate both the rational fraction and the small eigenvalues and eigenvectors. 
To differentiate the eigenvalues, we start with the eigenvalue equation 

QW) = \iW). (15) 

We now perform an infinitesimal change in the matrix Q, Q — > Q + SQ. The 
new eigenvalue equation reads 

(Q + SQ) (W) + \5j) = (X t + *A) (V ) + \S)) . (16) 
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Since we are free to define \5) so that {tp l \5) = 0, we immediately have: 

A* =(^W) (17) 
^W) = -PiQW) (18) 
T P l =(f - W)^ l \){Q - A,)- 1 ^ - l^'X^I). (19) 

We have added a second eigenvector projector to (19) so that both (<f>\Pi and 
Pi\4>) can be calculated numerically. 7 We note that this expansion is only valid 
if the separation between the eigenvalues is sufficiently large (see appendix 
B). We will treat the case when the two eigenvalues are near degenerate in a 
forthcoming paper [55,56]. 

We use a CG multi-mass solver to perform the inversion of Q — A which is 
required in (18). We exploit the normal-equation trick: 

^-(i - mm = (q+ AOga^i - £ i^-x^-i) + 

E J ' 



to convert the inversion of Q — \ required in the calculation of as given 

in (18) into a positive definite form — we are working in a subspace orthonor- 
mal to all the eigenvectors with eigenvalues lower than Aj. We can now use a 
CG multi-mass solver to evaluate the inversion. With a sufficiently large num- 
ber of kernel eigenvalues projected out of the Zolotarev approximation, this 
procedure proved to be faster and more stable than using a Minimal Residual 
inversion or a Chebyshev approximation of the inverse [57] for each eigenvec- 
tor. The multi-mass inversion converges well, as long as we project into the 
subspace frequently enough during the inversion, and n p is sufficiently larger 
than n e to ensure that the condition number of the multi-mass inversion re- 
mains under control. Tuning a is therefore a balancing act between different 
parts of the algorithm: a large a will correspond to a slower multi-mass in- 
version for the eigenvector differentiation (because we will be increasing the 
condition number for the inversion), but a faster inversion for the sign func- 
tion (since the Zolotarev weights generally increase as the condition number 
decreases) . 

The eigenvectors have to be determined whenever we calculate the fermionic 
force. We used the Arnoldi PARPACK package with Chebyshev acceleration 
[58,59] to calculate the lowest eigenvalues, and a CG minimisation of the 



7 These equations are just used to calculate the fermionic force. We cannot use the 
updates to provide a time evolution of the eigenvectors, for reasons of reversibility, 
computer time and accuracy. 
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Ritz functional [60] to ensure that no small eigenvectors were missed by the 
Arnoldi, and that all the eigenvalues were calculated to the required accuracy. 
This method proved to be considerably faster than using entirely a CG min- 
imisation, even though we can use the old eigenvectors as a starting point for 
the new calculation when using the CG algorithm. 

We are now in a position to calculate the fermionic force in the usual manner 
(sums over k, I and repeated spatial indices /x and x will be assumed from this 
point onwards): 

U^x)F,{x) + Fl(x)Ul(x) = - (1 - /i 2 )(X| (js^signQ + ^mQl^j \X), 

(20) 

X =H~ 2 (f). (21) 
The differential of the sign function is: 
/ d 



dr 



sign Q = U^x) [F% n Jx) + F? nm (x) + F^Jx) + F^ n Jx)\ + h.c, 

/ nm 

(22) 



where F R is the term generated by differentiating the rational approximation, 
F p is the term generated by differentiating the eigenvector projector 1 — 
while F s and F D come from the differential of e(A) in (12). These 
four terms are: 



~ Cfe75(l - l^a,Jx + e„d] A k de (l - |^)(^ 

-KQA k na (\^ l )(^\) axl5 (l - ^)5 x , c+e ^ cm . 



em 



Fining) =KPl,nxlse{k){l-l»)5 x ,c + e» {W)^\) cm 
+ K (1^X^1)^75(1 - 7M)^,c+e M e(A,)fl )Cm , 

F^{x) = - {W)W\) nm ^(^1,75(1 - 7,)<W^)c- (23) 



fj,,nm\ 



For later convenience, we define F G as the gauge field force, and F c as the 
continuous part of the fermionic force, i.e., 

=(X\^U„(x)V,(x)\X) + h.c. , (24) 

F° (x) = - (1 - f)iU»{x){X\ { 75 , F?(x) + Ffrx) + F*(x)} \X) + h.c. . 

(25) 



9 



In order to calculate the fermionic force, we need to invert the overlap operator 
twice, perform two multi-mass inversions of the Zolotarev rational function, 
and, as remarked before, calculate four multi-mass inversions of the Wilson 
operator to obtain the force from the small eigenvalue projection. This for- 
mula can be inserted into a standard HMC routine for Wilson or staggered 
fermions. However, the last term in (23) contains a Dirac delta function (the 
derivative of the sign function) that will ruin the performance of standard 
integrators. Whenever an eigenvalue of the Wilson operator crosses zero on a 
HMC trajectory, special attention has to be paid to this last term. We note in 
passing that eigenvalue crossings of the Wilson operator are associated with 
a change in the topological index Qf — — |Tr(sign(5). 



3 Eigenvalue Crossings 

3.1 Possible strategies 

There are several possibilities which can be taken to overcome the problem of 
the eigenvalue crossing: 

(1) Ignore the problem in the hope that it will vanish with increasing values 
of (5. In this spirit, one would use chiral projection [39,61,62] to allow 
for sampling across different topological sectors, and a small path length 
to compensate for the low acceptance rate. In principle this recipe might 
allow the simulation to bypass the potential wall. In practice, with the 
pseudo-fermion estimate of the determinant used here, the height of the 
action jump at the topological sector boundary is too large and attempted 
crossings too frequent to allow this method to be practical. In future work, 
we will show that using another representation of the determinant can 
reduce the height of the action jump to 0(1), which might make this 
method practical [63,64]. 

(2) Replace e with some continuous function for small A [39]. The substitute 
of the sign function will have to be broad enough so that the low eigenval- 
ues are affected, but not too broad as this might lead to a large deviation 
in the final Monte Carlo ensemble. In principle, it should be possible to 
narrow the revised e as the time step decreases. One can reduce these arti- 
facts by using a more accurate overlap operator in the accept/reject step 
than in the molecular dynamics [39], but this will lower the acceptance 
rate. Our experience with this method suggests that it cannot achieve an 
acceptable acceptance rate. 

(3) Restrict the simulation to one topological sector, either by always re- 
flecting (see section 4.4.2) when one encounters a potential eigenvalue 
crossing [65] , or by using an action which suppresses small eigenvalues of 
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the kernel operator (both by chosing a topology preserving gauge action 
and by adding a fermionic term) [66] . These methods have the disadvan- 
tage that one needs to calculate the re-weighting factors to combine the 
different topological sectors, and (more seriously) there are unresolved 
issues concerning whether these methods are ergodic. 
(4) As soon as one encounters a crossing, one repeats the micro-canonical 
step, with the integration over the delta function treated exactly [26,43]. 

Albeit being more costly (per HMC trajectory; the situation when the auto- 
correlation is taken into account is unclear) than the first three approaches, 
method (4) is our strategy of choice as it offers best control over systematic er- 
rors from the Dirac delta function's contribution to the fermionic force, as will 
be explained in the following sections. Moreover, it will provide a systematic 
way to improve the scaling of the HMC with overlap fermions. 



3.2 Computing the discontinuity 



The eigenvalue crossing induces a discontinuity in the fermionic contribution 
to the action, the kinetic energy, and the fermionic force. From the Monte 
Carlo energy (4) the second equation of motion is derived imposing energy 
conservation: 



dE 
~dr~ 



o = ... + $tA(V 2 )$ 



(26) 



Care needs to be taken when differentiating the fermionic contribution to 
the action near to a discontinuity in the overlap operator. We note that for 
discontinuous functions a and b, with a = a c (r) + A a 9{r — r c ) and b = b c (r) + 
A b 6{r — t c ) and a c and b c being continuous functions, the differential of the 
product of a and b is not the usual formula: 



: lim — 

St^o St 

lim — 

St^o St 



(a c (r)+A a 



a c (r + St) + A a b c (r + Sr) + A b - a c (r)6 c (r 



A a b c (r + 5t) + ( a c (r + St) +A a )A b 



+ 



\ db c da c (r) ir , . 



(27) 



With b = a 1 , one can show that A a 6 c (r) + (a c (r + St) + A a )A b = 0, and that 



^-l = -a- 1 (T + 5T)^a- 1 (r). 
aT UT 
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Therefore, 



dr St-*o dr 



H~\r). 



(28) 



If there is a discontinuity in H, such as when the eigenvalue Ai crosses zero, 
then we need to take one inversion just before the crossing, leading to X_ = 
H~ 2 (\i)4>, and the other inversion just after it, with X + = H~ 2 (—\i)4>: 



d 



d 



l-fi )(X+\ 75^sign Q + —sign Q75 \X. 



dr 



dr 



(29) 



We denote the momenta just before and after the eigenvalue crossing as IT - 
and Il + respectively (with the smallest eigenvalues A_ and A+). We can recast 
equation (29) into the form 



25r 



ti + ) 2 _ { n-f 



- (1 -/i 2 )(X+| ( 75 |^)(^| + I^)(^l7 5 ) \X. 



e(A_)-e(A + ) 
5r 



(30) 



This shows that integrating over the Dirac 5 function in the fermionic force 
will produce a discontinuity in the kinetic energy: 

(n + ) 2 -(n-) 2 =4rf(r c ) (31) 
d(r) = - (1 - / , 2 ) e (A_)(X + (r)| 75 |^(r))(^(r)|X_(r)) 

--(l- /U 2 ) e (A_)(X + (r)|^(r))(^(r)| 75 |X4r)). (32) 

It is straightforward to show that this discontinuity in the kinetic energy will 
exactly cancel the discontinuity in the pseudo-fermion action (X + |$) — (X_|<3>). 
Therefore, energy would be conserved across the eigenvalue crossing in an ex- 
act integration. Of course, numerical integration schemes are not exact and 
will produce a huge error at the discontinuity. Our task is to develop an inte- 
gration algorithm such as to maintain area conservation, energy conservation 
and reversibility in the presence of Dirac delta function forces. 



4 HMC with Overlap Fermions 



In this section we propose to add a correction step to the standard leapfrog 
algorithm (algorithm 1). The correction step allows us to handle the above 
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discontinuities from eigenvalue crossing as desired. In order to realize a proper 
HMC scheme for overlap fermions, the correction step has to be area preserv- 
ing 8 and reversible (ergodicity). In order to preserve the 0(At 2 ) complexity 
of the standard HMC, one must satisfy equation (29) with 0(Ar 2 ) errors or 
better. In the first part of this section, we shall introduce a general method 
suitable correction steps can be derived in a systematic manner. Thereafter, 
we shall present our integrator of choice, leaving the proof to appendix A that 
its correction step is area conserving, reversible and conserves energy with 
only 0(At 2 ) errors. 

4-1 Notation 

The space-time lattice is 4-dimensional with V lattice sites and AV links. In 
general, the SU(iVc') g au g e group is used (although in practice, throughout 
this work, we will be using Nq = 3), so the gauge field U T contains a member 
of SU(iVc<) on every link, and the momentum field Il r is represented by a 
Hermitian, traceless Nc x Nc matrix on each link. The subscript r refers to 
the computer time at which the gauge or momentum field is calculated. For 
convenience, At is set to 1, the start of the algorithm is at time 0, so that Uq is 
the original gauge field, and U\ is the final gauge field at the end of the leapfrog 
correction step. r c + 1/2 is the computer time at which the eigenvalue is 
the calculation of r c is discussed at the end of this section. The superscript 
"— " is used to indicate that the effects of the crossing are not yet included 
into the momentum update, and "+" indicates that the momentum has been 
updated to account for the crossing. H" + = lit, etc., and U c = UT +T = Ut +T ■ 

Finally, the notation (A,B) shall be used to represent J2 x ,iJ, r ^ r {Afi{x)B fl (x)). 
The continuous part of the (Hermitian) force is F T = F G + , with F G and 
F c defined in equations (24) and (25). 

U contains an element of SU(iVc) for every link, while n is a generator of U, 
i.e. it contains a Hermitian, traceless Nc x Nq matrix on every link. 

In order to simplify the notation for the correction step in its most general 
form in the following, n is expanded in terms of an orthonormal basis. This 
basis is defined by a set of orthonormal basis vectors which are divided into Ng 
subsets, {{r)l,r)l, . . .}, 77!, ...},.. .}. The parameter N k gives the number 
of i] vectors in each subset k. The r/f are 4V(N C — 1) Hermitian traceless 
matrices which satisfy (77^,77™) = 5ij5 km . N k and N s are defined such that the 



We note in passing that area preservation is not a strict requirement. Non-area 
preserving higher-order methods will be introduced in subsequent papers which will 
increase the rate of transmission [61] and allow a solution to problem of mixing 
between low lying eigenvectors [55] . 
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subscripts (i and j) run from 1 to N k , where Nk is not necessarily constant 
for all the k, while the superscripts (k and m) run from 1 to Ns- For the 
moment, we shall leave the N k s and N s as arbitrary parameters, which satisfy 
the constraint 

N S 

Y,N k = W(N 2 c -l). 
k=i 

Thus, the basis is complete, so that 

N S N k 

n-EEiEn). (33) 
fc=i i=i 

The 7] matrices are functions of U c , and are otherwise independent of the 
momentum. 7]\ = 77 is defined as being normal to the gauge field surface where 
the eigenvalue is zero (see appendix A and equation (A.29)). r]\ is proportional 
to F — 77(77, F), where F = F+ +1/2 — F~ +1/2 . The other 77 matrices are arbitrary. 



4-2 The classical mechanics case 

Before the situation of the molecular dynamics with the full overlap opera- 
tor is considered, the much simpler problem of a classical mechanics particle 
approaching a potential wall of height —2d is reviewed. 

The potential energy in this case is defined as V(q) = —2d0(q), (i.e. V — for 
q < 0, and V = —2d for q > 0). Note that d may be positive or negative. First, 
consider the one- dimensional case. The kinetic energy of the particle before it 
hits the wall is |(n~) 2 . After it has hit the wall, there are two possibilities: if 
the initial momentum is large enough, the particle will continue onwards with 
a changed momentum Il + = ^/(n~ 2 + 4d); this case is called transmission . If 
the momentum is too small, the particle will carry insufficient kinetic energy 
to cross the wall and will be reflected (elastically), giving a final momentum 
Il + = — Il~. This case is called reflection . The total energy in 2 -!-!/ is of course 
conserved in both cases. 

Next consider the classical mechanics particle in three dimensions. The co- 
ordinate system shall be defined in terms of the orthonormal basis vectors 
i], r){, and rj\ as just given above. It is assumed that the potential is given by 
V(q) = — 2d0((q, 77)), such that rj is the normal vector to the potential wall. 

The three components of the momentum are defined as (II - , 77), (U~,r]f) and 
(Il~,77|). It is well known what happens to the particle in classical mechanics 
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after it hits the potential wall; in the case of transmission, 



(n+ 77) = ^/(n- 77)2 + 44 (34) 



is obtained, and the transverse moments are 



(n+itf) = (n- tf). (35) 



For reflection, the final momenta will be 



:n + ,77) = -(rr,77), 



and 



(11+ tf)=(II- (36) 



respectively. Again, both cases conserve energy. 

In fact, there is no deeper reason why the molecular dynamics trajectory must 
follow the classical mechanics trajectory, as long as it is energy conserving, area 
conserving, reversible and ergodic. One can equally well use the scheme 



(n + ,77)=(rr,77) 

u + , v D =(n-,77 l 1 ) V / i + 4rf/((n-,77 1 1 )2 + (n-, 771)2), (37) 



as long as it conserves the area. It is evident that this update, as well as 
many others which can be chosen, conserves energy. The given example also 
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conserves the short calculation demonstrates: 



d(U+,rjl) 



d(U-,rj}) 



3 

Ju 



Jll Jl2 
J21 J22 



1, 



'1 + 



Ad 



(u-, v iy + (u-, v iy 

Ad(U-,r]l) 2 ( _ 



J12 
J21 
J22 



4d(n-,^)(n",^) 
((n-, ^ + ^-,1^)2)2 

4d(n- ^xn- ifa 1 ) 
((n-.^ + ^-.ifaW 



1 + 



4rf 



-1 



(n-,^ + (n-,^)^ 



'1 + 



'1 + 



4rf 



(n-,r /l 1 )2 + (n-,^ / 

Ad 

(u-, v iy + (u-,4)\ 



'1 + 



4d 



(n-,^ + (n-,^)2 
4d(ir,r fe 1 ) 2 / , 



4rf 



((n-^ + oi-,^ 



(n- i^ + oi- ifc 1 )' 



(38) 



It can be shown that this latter update is reversible, since reversing the time 
changes d — > —d. Thus, instead of creating a discontinuity in the momentum 
normal to the potential wall, we are introducing the discontinuity in the di- 
rections transverse to the potential wall. Furthermore, one can also consider 
a 5-dimensional case, with two additional basis vectors rjl and 773 , which pro- 
vides the option of changing momentum in any or all of the three directions 
defined by 77, {rjl,^}, an d {vhvi}- This general scheme will allow to achieve 
the 0(Ar 2 )-scalability of the approach. 



4-3 The QCD situation 



The are two differences in lattice QCD molecular dynamics to the classical 
mechanics example. Firstly, one has a considerably larger space, the configu- 
ration space of the gauge fields on the lattice. Assuming that one is working in 
SU(iV c ), with a D- dimensional lattice, and V lattice sites, then the gauge and 
momentum fields exist within a DV(Nq — l)-dimensional space, which gives a 
lot of freedom in how to update the momentum fields. Secondly, a numerical 
integration is performed rather than an exact integration, and care must be 
taken as to the effects of time discretization on both the energy conservation 
and the area conservation. 
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Simple update algorithm 



Now, the QCD correction update can be constructed, (II , U ) — > (Ui,Ui). 
The first step is to update the gauge and momentum to time r + At/2, as in 
the leapfrog procedure (algorithm 1), to yield fields IT~ and U~: 

n-=n + (F -)^ 

U~ =e lArn -/ 2 U . (39) 
Next, the integration proceeds up to the crossing point itself: 

117 , T =rr + t c f~ (40) 

2+Tc 

U c = e iTcIl ~U-. (41) 
We now perform the correction step. For transmission, we use equation (31): 

(nt +rt ,n| +ri ) = (n, +v n, +rt ) + 4 <i . (42) 

In terms of our basis, a general solution of (42) 9 is 



k=l \i=l 




db 



\2 



E4=4rf. (43) 

Finally, we move back to computer time r + At/2, and complete the rest of 
the normal leapfrog integration 

n + =nt +rc - t c (f + ) (44) 

U x = e ^/2U+ u+ 

n 1= n+ + i^. (45) 

This defines the simple update algorithm. Unfortunately we cannot make use 
of it because of two reasons: firstly, there is the possibility that one of the 
square roots in equation (43) might be imaginary (which would mean that II 
would no longer be Hermitian); and secondly because the steps described in 
equations (40) and (44) violate detailed balance. Although r c is a function of 
the momentum, this alone is not enough to violate detailed balance, but, as 
shown in appendix A, an update of the momentum parallel to 77 will violate 
area conservation. 



9 It should be observed that this is not the most general solution: there are other 
possibilities involving error functions. 
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Maintaining detailed balance 



In order to satisfy detailed balance, we can update the momentum in directions 
normal to 77, i.e. we replace (40) and (44) by II^ 2+Tc = U ± + T c (F ± — r](r], F ± )); 
note however that this replacement comes at the cost of an 0{r c ) violation 
of energy conservation. In appendix Al.l, we shall prove that a more flexible 
general momentum integration step which satisfies detailed balance is given 
by 

n + =n- - r c {F - 77(77, F)) - yTrF + 77(77, n~) (Jl + j-±L^ + 



N S 

£ 

k=2 



v k M, n- - ^(f + + f-)) + v k 2 (vl n- - ^(f+ + f~)) 



\ K, n- - f (f+ + f-)Y + ivl, n- - f (f+ + f-)]* 



- 1 



(46) 



We have inserted the fermionic force dependence here for later convenience. To 
satisfy detailed balance, we need N k = 2 Vfc, and the d k should be functions 
only of the gauge field at the crossing and (for k ^ 1) (77, II - ), and odd 
functions of At. 

If we set d 1 = Ad, and d k — \/k ^ 1, then we get an algorithm similar to that 
proposed in [26,43]. However, the significant disadvantage with this method 
- as remarked earlier — is, that it has an 0(r c ) energy conservation violating 
term (see appendix A. 3): AE T = t c ((F + ,r))(U + — (F~, 77) (II - , 77)). This 
term would cripple and extremely slow down the Overlap-HMC algorithm for 
large lattices. In this case a much smaller time step should be applied at large 
volumes (as the number of correction steps increases) — rendering the HMC 
impractical. 



Improved error behaviour 

Using the just proposed general update will allow a better, 0(Ar 2 ), scaling of 
the energy conservation, as shown in A. 3. Instead of adding 2d to the kinetic 
energy, we add 2d — AE T , by setting J2 dk = Ad — 2AE T . We can also remove 
some (though not all) of the 0(Ar 2 ) error in the same way, leaving us with 
errors which are close to 0(Ar 3 ), as demonstrated numerically in section 5.1.2. 
The presence of 0(Ar 2 ) errors is not inconsistent with reversibility because 
r c is an even function of At. 

These are the most important improvements of our method over that proposed 
in [26,43]. 
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Transmission and reflection 



A further problem arises when one of the square roots in equation (46) becomes 
imaginary. In analogy to the classical mechanics picture discussed above, fol- 
lowing [26,43], we reflect off the A = surface. On the one hand, if the mo- 
mentum is large enough, then we pass through into the new topological sector 
with a changed momentum. As in the classical mechanics example, we call 
this case transmission, 10 and this is the scenario described so far in this sec- 
tion. On the other hand, if the momentum is too small, we bounce off the 
A = surface. As before, this is denoted as reflection. Unlike transmission, a 
reflection update is accompanied neither with a change in topological index 
nor with a discontinuity in the fermionic action. 

Still there is a further subtlety which needs to be addressed before one can 
formulate the algorithms for transmission and reflection. With the techniques 
used in this article, the combined update step should also be area conserving. 
As an example, suppose that we set d\ = Ad, and all the other dk = 0, and we 
reflect whenever 1 + ^n-) 2 < ^- The method by which we choose to reflect or 
transmit is denoted as the reflection condition. One can write: 



When calculating the Jacobian one must differentiate the 9 (step) functions, 
which might lead to a Jacobian of the form 1 — 5(1 + Ad/(rj, II~) 2 ). This issue 
is outlined in more detail in appendices A. 2. 3 and A. 2. 4. In these appendices 
we demonstrate that the ^-function is not present if the reflection condition is 
independent of the momentum (i.e. we reflect when \d\ > d max ), or if there is no 
discontinuity between the transmission and reflection updates. We also argue 
in the appendices that even if the 5— function is present in the Jacobian, it will 
not affect the final ensemble. We test this conclusion numerically in section 
5.4. Both theoretically and in the numerical experiment, no systematic error 
is found. 

4-4 Integration over the discontinuity 

A first approximation of r c can be determined by a Taylor expansion of the 
eigenvalue around zero (see equation (17)): 




(47) 





It is called refraction in [26,43] 
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This gives the correct value of r c up to order r^. We can, in principle, calculate 
r c using (48) from either the gauge field U + or the gauge field U~ . There will 
be a small deviation between the two calculations, and therefore a small dis- 
crepancy between d and 77 calculated from the two gauge fields. This will lead 
to a breakdown in reversibility if we apply algorithm 2 naively. There are two 
possible ways in which this problem can be overcome. Firstly, one can calculate 
t c from U~ if r c /Ar < (i.e. we have already passed the crossing at molecular 
dynamics time r + Ar/2), and use U + to calculate r c otherwise. Calculating 
r c using U + requires an iterative procedure; we used a combination of simple 
iteration and the Van Wijngaarden-Dekker-Brent (VWDB) method [67,68] for 
root finding, which was reasonably effective. The second, and our preferred, 
possibility is to calculate r c exactly using a root finding algorithm, such as 
either Newton-Raphson or VWDB. We used a Newton-Raphson algorithm, 
iterating 



r n = „_i _ (^(Ar/2 + rr^\Q{AT/2 + tT^At/2 + r^)) 
W(AT/2 + T?- 1 )\Q(AT/2 + T?- 1 W(&T/2 + T?- 1 )y 



so that we could improve the accuracy of the eigenvector as the Newton- 
Raphson iteration progressed. The difficultly with bounded methods such as 
bisection or VWDB is that trying to constrain the eigenvector between two 
bounds could lead to a high accuracy eigenvector escaping bounds calculated 
with a low accuracy. Using Newton-Raphson allows us to adjust the accuracy 
of the eigenvector as we proceed with the calculation. This method, of finding 
the exact point where the eigenvalue crosses, has the advantage that it removes 
one source of 0(Ar 2 ) energy conservation violating terms, and it also was 
faster than the simple iterative method on our small trial lattices. However, 
the eigenvector with the smallest eigenvalue needs to be calculated to a very 
high accuracy. This method breaks down if the crossing is close to a minimum 
of the eigenvalue. In this case, it is necessary to use a different algorithm, 
such as Brent's method, to get sufficiently close enough to the solution for 
Newton-Raphson to converge. 



20 



4-4-1 Transmission 

The update scheme proposed for transmission is given by (see algorithm 2): 



II + =n~ + r c (F) - V r c ( V , F) - fTr(F) + 

v k Avt n- - 1 (F~ + f+)) + v k 2 ( v l ir - |(f~ + f + )) ) x 



1 + 



^ (77?, n- - f (f- + f+))2 + ( v l, n- - f (f- + F+))' 



- 1 



(50) 



d k — (4d - 2r c (F- r7)(n-,77) + 2r c (F + , 77) (II + , 77) + r c 2 (F~ + F+, F+ - F")). 

iV 5 



As mentioned earlier, 77^, and 77! must be orthonormal and orthogonal to i] 
and F. We constructed these vectors using the following procedure: 

• We start by taking three unit vectors, a, (3 and 7 corresponding to putting 
one of the eight Gell-Mann matrices on one particular link. 

• We then construct r\[ = a. cos 9 + (5 sin 6 cos + 7 sin 6 sin 0, choosing the 
angles 9 and so that 77^ is normal to i] and F. 

• r/2 is then constructed in the same way from three different Gell-Mann 
matrices on the same link. 

Therefore Ns is the number of links on the lattice. It should be noted that this 
procedure does not lead to a higher transmission rate than other choices. 11 
The change of momentum for any pair of 77— vectors is proportional to l/Afe, 
but there are Ns pairs of vectors, so the overall probability of transmission 
remains constant. Indeed, we will show in a forthcoming paper [61] that (if we 
are updating the momentum normal to rj and assuming that II is Gaussian 
distributed) the probability of transmission is always min(l, exp(— \ Yl dk)) + 
0(1 /V), no matter which area-conserving algorithm is chosen. We investi- 
gate below how this choice of transmission algorithm will affect the numerical 
stability and the transmission rate. 



4-4-2 Reflection 

A reflection takes place whenever one of the momentum updates in the trans- 
mission algorithm would lead to an imaginary momentum. To correct for the 
reflection's O(r) errors, one can use a method similar to the one which we use 
for transmission. Indeed, we used this algorithm for a while, and it worked 

11 Although it is possible to use non-area conserving algorithms with a higher trans- 
mission rate — see [61]. 
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(1) n-(r + Ar/2) = n(r) + At/2I1(t); 

(2) f/-(r + Ar/2) = e i ( Ar / 2 ) n ^ T+A ^ 2 )f/(r); 

(3) C/ c = e iTcn_ f/-; 

(4) The momentum update given in equation 50; 

(5) U+ = e- iT < n+ U c ; 

(6) U(t + Ar) = e i(*T/2)n+(T+AT/2) u+ ( T + Ar/2) . 

(7) n(r + At) = n+(r + Ar/2) + (At/2)I1 + (t + At). 

Algorithm 2. The modified leapfrog algorithm for transmission. 

(1) IT-(t + At/2) = n(r) + (Ar/2)n(r); 

(2) ET(t + At/2) =6^/2)^(^/2)^^. 

(3) U c = e iT ^U-; 

(4) The momentum update given in equation (51); 

(5) U + = e iTcIl+ U c ; 

(6) U(T + At) = e i(Ar/2 ) n+(r+Ar/2 )[/+ ( r + Ar/2) . 

(7) n(r + At) = U + (r + At/2) + (At/2)I1+(t + At). 

Algorithm 3. The modified leapfrog algorithm for reflection. 

well. However, there remains a problem with this method, which is what to 
do when the 0(t c ) error becomes sufficiently large that the square roots in 
the 0(t c ) correction become imaginary. Although this scenario did not occur 
in practice, it is a worrying possibility with no obvious answer. We have sub- 
sequently switched to a method first proposed by Fodor et. al. [65], which is 
based around a different idea. Our current update for reflection (see algorithm 
3) is 

n+ =ir - 277(77, ir) - 2t c (f~) + 277^(77, f~) + 2^iv(f-). (51) 

Note the change of sign in the gauge field update in step (5) of algorithm 3, 
as compared to algorithm 2. 

The sign of the smallest eigenvalue should change for a transmission step, and 
remain the same for reflection. However on a few rare occasions, this did not 
occur. Such odd phenomena happen if the A = surface is not smooth near 
to the point of crossing, so that the eigenvalue might try to cross again in 
the same molecular dynamics step. We corrected for this by adding a second 
correction step (i.e. we repeated steps (3) (4) and (5)) if the sign of the smallest 
eigenvalue did not behave as expected. 

Two different eigenvalues crossing during the same micro-canonical step will 
also cause this algorithm to break down, because of possible mixing between 
the two eigenvectors. This issue is discussed in appendix B. This phenomenon 
will be discussed in a future paper [55]. 
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5 Numerical Results 



In order to test the overlap HMC algorithm numerically on various small lat- 
tices, 4 4 , N F = 2 ensembles with (3 = 5.4 were generated, one set of runs 
with parameter settings k = 0.225 and fi = 0.05,0.1,0.2,0.3,0.4 and 0.5, the 
other with /i = 0.5 and k = 0.18,0.19,0.2,0.21,0.22 and 0.23. We have also 
generated three 6 4 , j3 = 5.6 ensembles, with k = 0.2 and ji = 0.3,0.1,0.05, 
and two 8 4 , (3 = 5.6 ensembles at k = 0.2, and /i — 0.1 or 0.3, although 
we did not analyse the small statistics 8 4 data sets for most of the results 
presented in this section. Phenomenological calculations using the overlap 
operator in the quenched approximation have covered masses in the range 
H ~ 0.007 -> 0.4 [24] or fi ~ 0.015 -> 0.037 [69], roughly a factor of 10 below 
our present mass range. 12 Throughout these simulations, we took advantage of 
relaxation and preconditioning techniques developed in [46,47]. The accuracy 
of the preconditioner and the number of projected eigenvalues were optimised 
for our HMC program, which gave a gain factor of ~ 30. We used anti-periodic 
boundary conditions in the time direction and periodic boundary conditions 
in the spatial directions. 

5.1 Energy conservation and topological index 

5.1.1 The correction step and the number of topological index changes 

Our first concern is to verify the impact of the correction step on energy con- 
servation, i.e. on the size of energy fluctuations during a Monte Carlo run. 
This can be achieved by a comparison of the upper plots of figures 1 and 2, 
which display the energy variations during the individual trajectories for a 
typical run. The lower plots in these figures exhibit the tunnelling histories 
of the systems through the topological sectors. A large spike in the energy 
difference signals a breakdown in energy conservation. Without the correction 
step (figure 1) there are spikes in the energy difference whenever the topologi- 
cal charge changes — which occurs when there is an eigenvalue crossing. This 
confirms our expectation that energy is not conserved when the topological 
index changes if the standard leapfrog algorithm is used. In all subsequent 
plots, we shall use the corrected update. Figure 2 is based on the same pa- 

12 In order to express \i as a physical mass, we used tq = 0.49fm to calculate the 
lattice spacing as approximately a -1 ~ 590MeV on our 6 4 lattices. The renormali- 
sation constant was not possible to measure on these lattices because the error was 
too large, so, for this rough estimate, we will take Z m = 1. This implies that on 
these ensembles a quark mass of [i = 0.05 corresponds to a physical mass of ~ 93 
MeV, i.e. around the value of the strange quark mass, however with a very large 
error. 
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Fig. 1. The energy difference between two micro-canonical steps, and the topological 
index plotted against the trajectory number, for a (3 = 5.4, fi = 0.5, k = 0.225 
ensemble generated without the correction step. 



rameters as figure 1; generally we find that most of the spikes disappear. The 
remaining spikes are caused when a positive and a negative eigenvalue of Q 
both approach, but do not cross, simultaneously. This can cause large mixing 
between the two eigenvectors, and a very large fermionic force. This second 
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Fig. 2. The energy difference between two micro-canonical steps and the topological 
index plotted against the trajectory number for 70 trajectories of the (3 = 5.4, 
{i = 0.5, k = 0.225 ensemble, generated with the correction step. 



effect will not affect the acceptance rate. It should be noted that sometimes 
the topological index bounces back within the same time step, which can cause 
a small discontinuity in the energy (unless, like our code, the HMC algorithm 
is designed to pick up this possibility). 
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Fig. 3. The energy difference between two micro-canonical steps (top), and the 
topological index (bottom) plotted against the trajectory number for 70 trajectories 
of the (3 = 5.4 n = 0.05 k = 0.225 ensemble. We used the correction step to generate 
this ensemble. 
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4 4 n = 0.5 

K n top 


4 4 « = 0.225 

M ra top 


6 4 k = 0.2 

At n top 


8 4 k = 0.2 

M "top 


0.18 0.000(0) 

0.19 0.006(6) 

no n 1 oo /qi ~\ 
U.z U.lzz^ol ) 

0.21 0.579(30) 

0.22 0.901(104) 

0.23 1.21(83) 


0.05 0.045(12) 
0.1 0.103(20) 
U.z U.oyo(oyJ 
0.3 0.567(46) 
0.4 0.827(52) 
0.5 1.12(73) 


0.05 0.040(14) 
0.1 0.074(18) 

0.3 0.547(74) 


0.1 0.153(42) 
0.3 0.969(175) 



Table 1 

The number of topological index changes per trajectory (nt p) for various masses 
on our 4 4 , k = 0.225 ensembles (left), the 4 4 , fi = 0.5 ensembles (middle), and the 
6 4 , k = 0.2 ensembles (right). Note that the 4 4 and 6 4 ensembles were generated at 
different values of k (and (3), so the bare quark mass at constant fi is 20% lower for 
the 6 4 ensembles. 



When we go down to smaller masses, a slightly different picture emerges (see 
figure 3 — note that this figure includes the correction step). We notice that 
the topological index, Qf, changes considerably less frequently at a lower mass 

(see table 1, 2 n< ^-4^ n columns), as we would expect. The Qf ^ configura- 
tions are suppressed as we move to lower masses, and as the mass decreases, 
d increases, 13 leading to a lower probability of transmission. 14 The energy 
difference depends strongly on the topological index: on the 4 4 , fi = 0.05 en- 
sembles, the mass is about the same size as the lowest non-zero eigenvalue, 
implying that the inverse of the overlap operator, and therefore in general 
the fermionic force, will be larger for a Qf ^ configuration. We have only 
observed this effect of the large fermionic forces with non-trivial topology on 
our smallest lattices, although it is possible that it will return in large volumes 
in the e-regime. It can be avoided by working in the chiral sector with no zero 
modes [39,61]. 

Finally, it can be seen that reducing k (and therefore reducing the Wilson 
mass mo and as a consequence the bare fermion mass) generally reduces the 
number of crossings (see figure 4 and table 1 (first column)). As expected, 
see [70,71], there are no changes in the topological index below the critical 
value of k, which is around 0.2 on the 4 4 lattices. We confirm that the overlap 
operator has very few small eigenvalues below the critical value of k [72]. 



13 More precisely d oc fi 2 . 

14 The probability of transmission is proportional to min(l, exp(— 2d)). 
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Fig. 4. The energy difference between two micro-canonical steps, and the topological 
index plotted against the trajectory number for a section of the (3 = 5.4 [i = 0.5 
k = 0.2 ensemble. 



5.1.2 Complexity of correction step 



To investigate the At dependence of the energy conservation violation for 
the correction step, we took 100 6 4 , fx = 0.3 configurations, containing 35 
transmission correction steps and 123 reflection correction steps. We ran the 
molecular dynamics as normal (with At = 1/30) up to the eigenvalue cross- 
ing. We then reduced the time step for the molecular dynamics to Ar/n ( , 
running n t — 1 normal leapfrog updates and one modified leapfrog update. 
We calculated the absolute value of the error in the energy for the modified 
leapfrog step only and averaged over the configurations. This procedure en- 
sures that on each configuration the gauge field and momentum at the crossing 
have only a small dependence on rit- The energy differences for our procedures 
(algorithm 2) and the algorithm proposed in [26,43] are given in tables 2 for 
the transmission step and 3 for the reflection step. We expect (see appendix 
A. 3) an 0(r c ) error for the algorithm proposed in [26,43] and an 0(r c 2 ) error 
for the algorithm presented in this paper, where — Ar/(2n t ) < r c < Ar/(2n t ). 
Fitting the results to the form a n t ai gave the results as presented in table 4. 

First, consider the results for the algorithm presented in [26,43]. Comparing 
the raw data with At and At 2 in tables 2 and 3 suggests that the energy 
conservation is dominated by O(At) terms. This is also indicated by the results 
of the fits in table 4. Furthermore, AE q /t c is large and constant (there are 
still some large contributions from the standard leapfrog part of our algorithm 
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nt 


1/nt 


1/n? 


|AB | 






|AB| 


AE 




A£ n (n t =l) 


AE(n t = l) 


1 


1.000 


1.000 


0.120(21) 


1.000(172) 


0.50(9) 


0.059(6) 


1.000(105) 


0.83(33) 


2 


0.500 


0.250 


0.050(9) 


0.414(76) 


0.37(5) 


0.009(1) 


0.154(25) 


0.14(4) 


3 


0.333 


0.111 


0.031(7) 


0.261(57) 


0.32(5) 


0.0030(7) 


0.051(12) 


0.16(8) 


4 


0.250 


0.063 


0.019(4) 


0.154(32) 


0.31(4) 


0.0016(4) 


0.027(7) 


0.21(14) 


5 


0.200 


0.040 


0.015(3) 


0.122(27) 


0.31(5) 


0.0009(2) 


0.015(3) 


0.032(8) 


6 


0.167 


0.028 


0.013(3) 


0.107(27) 


0.35(6) 


0.0005(1) 


0.008(2) 


0.018(4) 


7 


0.143 


0.020 


0.011(2) 


0.090(19) 


0.34(5) 


0.0004(1) 


0.007(2) 


0.019(4) 



Table 2 

The energy difference AE for the transmission correction algorithm 2 compared 
with the molecular dynamics time step, Ar = l/30n t . For comparison, results for 
the algorithm given in [26], here denoted as AEb, are also included. 











[26] 






section 4 




nt 


l/n t 




|AEb| 


AB 


Ar^o 


|AB| 


AE 


ArM 


AB n (n t =l) 




AE(n t =l) 




1 


1.000 


1.000 


0.296(30) 


1.000(101) 


1.35(12) 


0.105(23) 


1.000(215) 


0.94(23) 


2 


0.500 


0.250 


0.133(13) 


0.451(44) 


1.16(10) 


0.0114(30) 


0.109(28) 


0.16(3) 


3 


0.333 


0.111 


0.099(11) 


0.335(37) 


1.12(9) 


0.0030(5) 


0.028(4) 


0.08(2) 


4 


0.250 


0.063 


0.064(7) 


0.217(25) 


1.16(10) 


0.0027(16) 


0.026(2) 


0.08(2) 


5 


0.200 


0.040 


0.059(6) 


0.198(19) 


1.16(9) 


0.0011(4) 


0.011(4) 


0.07(3) 


6 


0.167 


0.028 


0.045(5) 


0.151(16) 


1.12(9) 


0.00049(9) 


0.005(1) 


0.03(1) 


7 


0.143 


0.020 


0.036(4) 


0.123(14) 


1.12(9) 


0.00037(8) 


0.004(1) 


0.06(2) 



Table 3 

The energy difference AE for the reflection correction algorithm 2 compared with 
the molecular dynamics time step, Ar = l/30n t . For comparison, results for the 
algorithm given in [26], here denoted as AEq, are also included. 





a oi x 2 


AE ([26]) 
AE (section 4) 


O.H9(i:^6 7 ) -1-271(1^1) 0.52 
0.058(±°:8») -2.642(±°;i£) 1.11 


AE ([26]) 
AE (section 4) 


0.291(±ffi) -1-044(1^92) 0.92 
0.083(1°;°!) -2-879(1°;^) 4.01 



Table 4 

Fits and x 2 values for the data presented in tables 2 and 3 for the functional form 
AE = aorit ai for transmission (top) and reflection (bottom). The errors are the 68% 
confidence levels for the fit. 

at n t — 1), suggesting that the violation in energy conservation is dominated 
by a large O(t ) term. For the energy differences given by our algorithm, a 
different picture emerges: the violation in energy conservation is dominated 
by the 0(Ar 3 ) terms from the normal leapfrog part (steps 1, 2, 6 and 7) of the 
algorithm, while the O(r^) error which we expect from the correction steps 
(steps 3-5 in algorithm 2) is small. This is an encouraging result that deserves 
to be investigated on larger lattices and smaller masses. 
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M 


<S g > 


t j 




K 


<S g > 








0.05 


0.5754(10) 


25.05(tf 2 ;g) 


-0.0716(40) 


0.18 


0.5485(14) 


6.3i(±i;f ) 


-0.0137(31) 




0.1 


0.5733(7) 


12.76(ti£) 


-0.0778(51) 


0.19 


0.5522(8) 


3-77(lS:S) 


-0.0153(58) 




0.2 


0.5678(14) 


11.90(±|;1?) 


-0.0506(61) 


0.2 


0.5555(10) 


6.71(1^) 


-0.0054(50) 




0.3 


0.5638(8) 


4.60(±^) 


-0.0376(55) 


0.21 


0.5573(8) 


i-82(lg : S) 


-0.0241(80) 




0.4 


0.5570(9) 


3.98(lg;S) 


-0.0223(65) 


0.22 


0.5560(11) 


i-88(i; : y) 


-0.0266(94) 




0.5 


0.5523(13) 


4-46(i° ;S) 


-0.0050(80) 


0.23 


0.5528(14) 


5.26(+J 8 36 ) 


-0.0015(69) 


Table 5 



The average gauge energy per lattice site, < S g >, the real part of the Polyakov loop 
Pi and jackknife auto-correlation lengths for the plaquette, for the 4 4 , k = 0.225 
ensembles (left) and the fi = 0.5 ensembles (right). 



5.2 Polyakov loop and plaquette 

Next we calculate several observables to check that we get sensible numbers. 
Here we look at the value of the plaquette — or, equivalently, the average gauge 
energy per lattice site, S g — (l — 2^z{U^ v + U^ u )^ — and the real part of the 
Polyakov loop. 

The values of the gauge energy per lattice site and the Polyakov loop for one 
of our ensembles are plotted against computer time in figure 5 (other ensem- 
bles are similar). Both the plaquette and the Polyakov loop are stable over 
computer time. The average Polyakov loop, given in tables 5 and 6 is small for 
all our configurations, suggesting that the configurations are all confined. The 
average value of S g per lattice site is given in tables 5 and 6. The statistical 
errors on S g shown in these tables were obtained by calculating the jack- 
knife error af^ with rij consecutive configurations in each bin. The jackknife 
method should give a stable value when rij ^> Tj, the auto-correlation length. 

By fitting the jackknife errors to the curve crj"^ = a + b(l — e~ nj / Tj ) (the errors 

on were calculated using the bootstrap method) we can get an estimate 
of the value of the plateau (<7j = a + ft), and a measure, Tj, of the exponential 
auto-correlation length. On the 4 4 lattices, the auto-correlation length is small 
at the larger masses, but increases roughly as 1/ ji as we decrease the mass. On 
the larger lattices, the auto-correlation time is more stable with the mass. We 
were not able to obtain a reliable estimate of the integrated auto-correlation 
times on such small lattices. 

5.3 Topological susceptibility 

The average values of the topological index, Q f = n_ — n + (n± is the number 
of positive/negative chirality zero modes), and of Q 2 ^ are given in tables 7 and 
8 and are plotted in figure 6. The tables show that the average values of Qf are 
generally close to 0, suggesting that there is no bias towards either a positive 
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0.59 




-0.3 - 



-0.4 1 ' ' ' ' ' ' 1 ' 1 1 

50 100 150 200 250 300 350 400 450 500 

trajectory 

Fig. 5. The gauge energy per lattice site (top) and Polyakov loop (bottom) for each 
trajectory on the 4 4 ,fi = 0.5, k = 0.225 ensemble. 

or negative topological index. 

The topological susceptibility, which is proportional to < Qj- >, can be related 
to the quark mass using chiral perturbation theory [73-75]. It is expected that 
at low quark masses on large enough volumes the topological susceptibility 
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A 4 


< bg> Tj Fl 


0.05 

0.1 

0.3 


0.5353(10) 9.97(+};^g) -0.0082(48) 
0.5350(6) 8.59(1^) -0.0049(33) 
0.5230(19) 10.93(1^1) -0.0018(28) 



Table 6 

The average gauge energy per lattice site, < S g >, the real part of the Polyakov 
loop Pi and jackknife auto-correlation lengths for the 6 4 ensembles. 





<Q)> 


< Qf > 


^conf 


K 


<Q}> 


<Qf> 


^-conf 


0.05 0.19 


0.019(6) 


-0.011(5) 


734 


0.18 


0.000(0) 


0.000(0) 


459 


0.1 0.40 


0.118(18) 


0.021(33) 


959 


0.19 


0.003(4) 


0.004(4) 


733 


0.2 0.89 


0.343(36) 


0.044(34) 


868 


0.2 


0.044(16) 


-0.001(14) 


760 


0.3 1.52 


0.421(41) 


-0.085(35) 


967 


0.21 


0.255(35) 


0.014(41) 


442 


0.4 2.37 


0.507(46) 


0.034(39) 


905 


0.22 


0.461(44) 


-0.047(62) 


706 


0.5 3.56 


0.480(62) 


0.057(47) 


523 


0.23 


0.667(89) 


0.088(66) 


433 



Table 7 



The average values of the topological index Qf, Q"j, and the number of configura- 
tions for the 4 4 , k = 0.225 ensembles(left) and the 4 4 , \i = 0.5 ensembles (right). 
nib is the bare fermion mass, given by equation (10). 

should be proportional to the quark mass, the square of the pion mass: 

There have been several attempts to verify this relation in recent years (for 
recent examples, see [76,77]), with mixed success. At larger quark masses, 
the topological susceptibility should tend asymptotically towards its quenched 
value. The transition between the two forms, in a large volume, is expected to 
be around 50-90 MeV [78] . Our bare quark masses, which we estimate (based 
on our early calculations of the lattice spacing on our 6 4 ensembles) range from 
about 100 MeV to a few GeV and should be in the transitional region between 
the two known limits: we cannot expect to see a linear decrease with the quark 
mass. However, we recognize a decrease in the topological susceptibility, and 
our results are not inconsistent with the expected functional form, although 
again we emphasise that our lattices are far too small to draw any meaningful 
conclusions. 
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fi mi, 


< V/ > < V/ > n conf 


0.05 0.19 
0.1 0.40 
0.3 1.52 


0.063(21) -0.008(21) 377 
0.138(22) 0.026(20) 350 
0.493(62) 0.086(61) 420 



Table 8 

The average values of the topological index Qf, Q 2 , and the number of configura- 
tions for the 6 4 ensembles, is the bare fermion mass. 



condition 


^COnf 


AR 


"top 


< Q) > 


<S g > 


<Pi > 


\d\ > 0.5 


709 


81.9% 


1.048(68) 


0.354(35) 


0.5625 


-0.0325 


\d\ > 0.45 


855 


84.1% 


0.978(57) 


0.326(26) 


0.5653 


-0.0286 


\d\ > 0.4 


864 


86.2% 


0.920(60) 


0.317(33) 


0.5640 


-0.0377 


\d\ > 0.35 


460 


86.5% 


0.843(74) 


0.350(33) 


0.5645 


-0.0270 


\d\ > 0.3 


705 


87.8% 


0.688(59) 


0.311(35) 


0.5646 


-0.0300 


\d\ > 0.25 


698 


92.0% 


0.598(50) 


0.317(34) 


0.5657 


-0.0431 


|d| > 0.2 


604 


96.0% 


0.439(55) 


0.285(31) 


0.5653 


-0.0353 


1 + YM^ >0 


768 


97.8% 


1.047(62) 


0.357(32) 


0.5639 


-0.0357 



Table 9 

The acceptance rate, number of topological index changes per trajectory, n top , and 
average values of Q 2 , S g and Pi for 4 4 ensembles generated at fi = 0.3, k = 0.225 
and (3 = 5.4, but with different reflection conditions. 

5.4 Testing the area conservation of the reflection condition. 

As remarked earlier, the reflection condition applied is to be tested numeri- 
cally in order to verify that area conservation is fulfilled and no systematic 
errors are spoiling the results. We compared the results from our algorithm 
with the results from a second algorithm which explicitly satisfies detailed 
balance, but where the molecular dynamics does not always conserve energy. 
This algorithm reflected whenever d was smaller than a constant d max and 
used no correction step for transmission. When d was larger than this con- 
stant, it violated energy conservation. A high <i max leads to many topological 
index changes and therefore a short topological auto-correlation length but 
has a low acceptance rate. A low d max would have a higher acceptance rate 
but fewer topological index changes. 

In table 9, we show how a varying d max affects the acceptance rate and the 
number of topological index changes per trajectory. Using the exact area- 
conserving algorithm has a cost in either the acceptance rate or rate of topo- 
logical charge changes. Setting d max = 0.5 has a similar number of topological 
index changes per trajectory, but at a cost of 15% in the acceptance rate. Low- 
ering d max so that the acceptance rate is similar to the momentum- dependent 
algorithm reduces the number of topological index changes by over a factor of 
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Fig. 6. The ensemble average of Q'j plotted against bare quark mass on the 6 4 (top) 
and 4 4 (bottom) ensembles. 

2, and increases the auto-correlation correspondingly. 



The table shows that the average values of plaquette, topological susceptibility 
and Polyakov loop are independent of the reflection condition. We therefore 
feel confirmed in assuming that a potential systematic error is extremely small. 
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However, this issue will need to be checked again when we move to larger 
lattices at smaller masses. 



5.4-1 Volume scaling of the overlap EMC 

The authors of Ref. [57] remark that the algorithm might scale with the square 
of the lattice volume. Their argument goes as follows: 

(1) The density of small eigenvalues should be independent of the volume; 
therefore the number of small eigenvalues scales with the volume. 

(2) The number of Wilson eigenvalues attempting to cross the zero line will 
be proportional to the density of small eigenvalues. Therefore, there will 
be O(V) correction steps. 15 

(3) In total, the algorithm might scale as V 2 . 

There are, however, arguments against this reasoning: The density of small 
eigenvalues is not just a function of the lattice volume as point (2) suggests, 
but also the lattice spacing and the fermion mass. Thus, concluding that the 
number of correction steps will be proportional to the volume is too simplistic, 
as on larger volumes we are also likely to change the other parameters in the 
action. One could use the same argument to suggest that the algorithm will 
accelerate as the lattice spacing decreases, since there will be fewer attempted 
crossings. More importantly, this argument does not take into account the ef- 
fect of the auto-correlation. Our numerical experience shows that the overlap 
eigenvalues are remarkably stable under the molecular dynamics, except when 
there is a topological index change. Our results [80] demonstrated that small 
non-zero eigenvalues of the overlap operator are only generated significantly 
faster through the mechanism of two topological index changes rather than 
the normal evolution of the molecular dynamics. Thus the auto-correlation 
time depends strongly on the rate of topological index changes, which in turn, 
assuming that the transmission rate is large enough, increases proportional to 
the rate of attempted eigenvalue crossings. Thus the more attempted topolog- 
ical index changes, the shorter the auto-correlation times should be. 

Nonetheless, we expect problems with the condition number of both the over- 
lap and Wilson operators increasing as the volume increases, leading to a 
slowing down in both the eigenvalue and inversion routines. 

Results from test runs on lattices ranging from size 4 4 to 16 3 32 show that 
the number of required correction steps is certainly not proportional to the 
volume; still it is increasing slightly. 



See the results of [79]; we are intending to test these results. 
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0.21 


2192(100) 


30 


91% 


0.4 


1836(180) 


30 


87% 


0.22 


2485(130) 


30 


90% 


0.5 


1972(220) 


30 


90% 


0.23 


1937(190) 


30 


90% 



Table 10 

The average timings for one trajectory (in seconds), the number of molecular dy- 
namics steps and the acceptance rate for the k = 0.225 ensembles (left), and the 
pi, = 0.5 ensembles (right). The total time of the trajectory, ATn m d, is equal to 1. 

5. 5 Computer performance 

The timings to generate a 4 4 trajectory are shown in table 10. They are based 
on running on four nodes of the cluster computer ALiCE at Wuppertal Univer- 
sity. The 6 4 ensembles (table 11) were generated on eight nodes and the 8 4 en- 
sembles on 16 nodes. 16 We used CG with relaxation and preconditioning [47] 
to perform the inversion, which has approximately a factor of 4 performance 
gain over the standard CG. Our timings are quite independent of the mass, 
although we expect this fact to change on larger and less well conditioned sys- 
tems. However, the scaling of the computer time with mass will be better than 
the n~ 3 behaviour which we assume for Wilson or staggered fermions because 
with overlap fermions there are fewer crossings for small masses. It can be 
seen by comparing the timings on the \i = 0.5 ensembles (table 10) below and 
above the critical k that at this large mass the time needed for the treatment 
of the crossings was around 40% of the total computer time (there are usu- 
ally 3-4 reflection or transmission steps per trajectory at fi = 0.5 and k well 
above the critical k). The acceptance rate was stable for all our 4 4 ensembles. 
We increased the number of molecular dynamics steps (n m d) to 50 (keeping 
Arn md = 1) for the 4 4 ensemble generated at fj, = 0.05 and k = 0.225 in order 
to counteract the large fermionic force in non-trivial topological sectors (see 
section 5.1), but otherwise we did not have to change the trajectory length as 
we changed the mass to maintain the acceptance rate. On our larger lattices, 
we had to increase n md at lower masses. 



In parallel computations, we estimated — by explicitly calculating the number 
of floating point operations for one trajectory — that ALiCE runs at around 0.4 
GFlops/node for the 4 4 ensembles and 0.7 GFlops/node for the 6 4 and 8 4 ensem- 
bles. 
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40 
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Table 11 

The average timings for one trajectory (in seconds), the number of molecular dy- 
namics steps, and the acceptance rate on the larger lattices, with k = 0.2. The total 
time of the trajectory, is equal to 1 . 

6 Conclusions and Outlook 

In this paper, we have presented the basics of an improved simulation scheme 
for full QCD including the chirally symmetric overlap fermion discretization. 
We have constructed an exact Hybrid Monte Carlo algorithm being capable of 
generating gauge field configurations with two flavours of dynamical overlap 
fermions. The HMC algorithm proposed aims at the exploration of the low 
quark mass regime of lattice QCD with a lattice action observing exact chiral 
symmetry. The method can be extended to solve the problem of low modes 
that are mixing as well as to improve topological tunnelling rates. 

The main conceptual difficulty constructing a dynamical overlap algorithm is 
the appearance of a Dirac delta function within the fermionic force. A naive 
straightforward integration leads to action differences from the discrete leap 
frog integration which turn out to be much too large to allow acceptance in 
the global Monte Carlo decision step at the end of a trajectory. These large 
action differences are caused by zero-mode crossings of the low eigenvalues of 
the Wilson kernel used within the overlap operator. They force the algorithm 
to slow down and to get stuck eventually. 

In order to avoid the action violations as caused by eigenvalue crossings, we 
have developed an explicit integration procedure for evolving the low crossing 
modes within the molecular dynamics part of the HMC In case of reflection 
on the action wall the change in energy is zero by definition and also in case of 
transmission a quite smooth integration over the singularity is achieved. The 
new scheme can cope with multiple eigenvalue crossings within the same MD 
time step. 

The proposed HMC for overlap fermions has several advantages: 

- The action differences are small enough to allow for reasonable acceptance 
rates of the HMC 
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- The scheme is constructed to allow for a systematic improvement of the 
complexity of the action violation as arising from a finite integration time- 
step. 

- The action violation of the specific realization constructed in this paper is 
comparable to the 0(At 2 ) error of the standard HMC leap-frog integration 
over the entire trajectory. 

- The algorithm is shown to satisfy reversibility and area conservation. 

- The universal ansatz in this paper for the correction step in terms of or- 
thonormal vectors rjk has the potential to construct a non- area- conserving 
higher-order integration method. This approach has the potential to solve 
the problem of the mixing of low modes that occasionally leads to large 
action violations on larger lattices, spoiling the acceptance rate even after 
correction of single mode zero crossings. Such an improved scheme, based 
on the present approach, will be described in a subsequent paper [55]. 

- In addition, the universal ansatz of this paper for the correction step is part 
of the improvement of the tunnelling of the topological index to be described 
in an upcoming paper [63]. 

We have tested the algorithm on Teraflop- Class computers with lattice sizes 
ranging from 4 4 to 8 4 , calculating a set of basic physical observables. Despite of 
large finite-size effects appearing on such small lattices, we do get results for, 
e.g., the topological susceptibility, which behave qualitatively as expected. 
This demonstrates that intermediate lattices are in reach of Teraflop-class 
computers. 

With the help of the new algorithm and improvements to be presented in 
three forthcoming papers [55,63,64], larger lattice sizes, required to realize 
physically meaningful systems, will become accessible on upcoming Petaflop- 
class systems. 
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Appendix 



A Detailed Balance and the Size of the Correction Step Error 



In this section we prove that the general correction update (46) satisfies de- 
tailed balance. Extending this proof to the slightly modified update in equa- 
tions (50) and (51) is straightforward. 



A.l Detailed balance and reversibility 



The initial and final leap frog steps (steps 1 and 2, and 6 and 7) in algorithm 
2 are known to be reversible. Thus, it just has to be shown that the correction 
steps (steps 3, 4 and 5) are reversible. Here, the proof of reversibility for the 
correction step for the case of transmission in algorithm 2 is given. The proof 
for the reflection update is similar. 



A. 1.1 The momentum update 



It is assumed that when reversing the sign of At, dk — > — dk V/c. This condition 
is satisfied for our algorithms, because r c and rf k remain constant, F + — > F~ 
and d — > — d, since the sign of the smallest eigenvalue is opposite for the 



39 



reverse update. Our general transmission momentum update (46) reads: 



n+ =IT - r c (F~ - F+) + T c rj(rj : F~ - F + ) + -Tr(F _ - F + ) + 



77(11-77) 



1 + 



1 



^ (H-,r7) : 

+ E (v k M, n- - + f + )) + n- - |(f- + f+))) 



X 



1 + 



^ (r)?, n- - f (F- + + fa*, n- - T f(F- + F+)Y< 



(A.l) 



so that 



(11+17) = (IT, 17), 



1 + 



\ (n",r7) 2 ' 



(n + + |(f- + f+), % ) = fa fci , n- - ^(F~ + F+)) x 



1 + 



\ (77*, n- - f (f- + f+))2 + fa*, n- - f (f- + f+))' 



- 1 



(A.2) 



The time-reverse update, with At — > — Ar, is given by 



n'_ =n+ + r c (F+ - F-) - r c ri(r], F + - F~) - -Tr(F~ - F+) + 



1 - 



\ (n+17)' 



-1 + 



£ (77^(77^, n+ - |(f- + f + )) + n + - 1 (f- + f + )) ) x 



fa*, n+ - f (f- + f+)) 2 + fa*, n+ - f (f- + f+)) 2 J • 

(A.3) 
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We note that 
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di 



- 1 



=(ir, id. 



i + 



d\ 



i - 



- 1 



=(n- i,) l- 



i + 



d\ 



\ (n-,17) 



2 ' 



(A.4) 



(n + --(F- + F+),?7^ 



4 



\| faf, n+ + f (f- + f+)) 2 + ( % fe , n+ - f (f- + f+)) 5 
= -(n--^(F- + F + ),^)x 



1 + 



^ (77* , n- - f (f- + f+)) 2 + fa*, n- - f (f- + F+)) ; 



1 . 



(A.5) 



Combining (A.l), (A. 3), (A.4) and (A.5) gives W_ = IT"; therefore, the mo- 
mentum update is reversible. 



A. 1.2 The gauge field update 
The gauge field update is given by: 



U + = 



Switching At — > — At leads to 



U'_ = e- iTcIl ~e iTcU+ U + , 



(A.6) 



(A.7) 



which gives U'_ = U ; therefore, the gauge field update is reversible. 



A. 2 Detailed Balance: Area Conservation 



A. 2.1 Transmission 

The transmission update, given by algorithm 2 and using equation (46) can be 
shown to be area conserving. For simplicity, it is assumed that d\ is a function 
of the gauge field at the crossing only and has no additional dependence on 
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II, and all the other are functions of the gauge field of the crossing and 
(77, n) only. The matrices 77J- and the fermionic forces F ± are just functions 
of the gauge field at the crossing. These conditions can be relaxed, but not 
necessarily for the purpose of this paper. 

Di is a derivative of the gauge fields U, defined as 

DJ(U) =jim^- f(e^u), (A.8) 

where Tj are some suitable generators of the S\J(N C ) gauge group (e.g. in 
SU(3), Ti = Aj/2, where \ are the Gell-Mann matrices). One can write the 
gauge fields in terms of Nq — 1 parameters r\ so that 

DJ(U) = ^-f(U). (A.9) 

Similarly, one can define coordinates for the momentum II and the matrix rj 
in terms of the generators T: 

U,(x) =7rf% rj,(x) = V rT l} F ± - ±Tr(F ± ) =fgT h 

»£» "><" 7 '- (A.10) 



x refers to the lattice site, and /i is a direction index. The normalization 

k,xfi k,x 
n j,i 



condition (77,77) = 1 implies that is normalized according to nf^n^ 1 ' 



For notational convenience, we shall define: 



E (nff(£/)JV£(tf, n) + rJ^{U)N^U, II)) 



1+ (N&(u,iL)y + (N* 2 (u,n)y V' (A '^ 



=i (<f (vrf - |(/- - /-))) . (A.12) 
The generator of a time update At is 



-e 

AtH 



d 1 g_E a 



(A.13) 
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For the standard leapfrog update, we define the operators 



At pi 8 

p — e 2 1 

Q=e^ n ^, (A.14) 

then we have 

Pf(n,r) =f( Tr + F'^,r) Qffar) =f(n,r + ^-n). (A.15) 

F' is the total force in the momentum update (the sum of the gauge field and 
fermionic forces). The standard leapfrog update is given by 

pQQp = e *T(H+0(Ar*)) (A16) 

PQQP is manifestly area conserving, because the operators P and Q are area 
conserving. 

For the correction step, we need to define two new update operators, P c and 
Q c , so that <5 c /(7r, r) = f(ir, r - t c tv) and P c f(ir, r) = f(ir + N h r). P c and Q c 
are 



d 1 d 2 1 ^ 
P c =1 + iV,— + ^AW^ + -NiNjN*- + . . . 



(9 1 d 2 1 <9 3 

Q c =1 - T- c 7Ti— + -T^TTiTTj- h ~ 7~ 3 7T j 7Tj 7Tfc — h . . . . (A.17) 

dr,i 2 OTirj 6 OTjVjT^ 

Because t c ] ^ and [d/dr i: Nj] ^ 0, we cannot factorise P c and Q c 

into a simple exponential form, although using equation (A. 9) we can write 

QJJ = e -iTc*iTi jj = e -ir c U- ( A lg ^ 

These operators by themselves are not area conserving. The full correction 
update is Q\P C Q C : 



QtP&n- =n- + ^(nT *? ) Ui + 



4 * - 1 + 



{j]m (T c )7Tm 



' i : - F ~) - \wiT(U - f-)f+ 



Q\P C Q C U- =QlP c e-^ u 'U- 

^Ql e - iT -( n ~+ N ( u ~' n ~))U~ 

=e - i r c (n_ + AT([/ c> n_)) ei r c n- [/ - = f/+) (Aig) 
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where d, f, r\ etc. are all calculated at the crossing gauge field U c . Therefore, 
the update in algorithm 2 can be written as PQqJ P c Q c QP. We know that 
P and Q are area conserving, so we just need to show that Q c P C Q C is also 
area conserving. We write the updated gauge field and momentum as q and 
p respectively (recalling that the initial fields were r and n in our notation). 
This gives us 

pT = QcPAc^T = tt?" + \r)T^T^T {^ l + ^Mkf ~ X ) ~ 

E (nffN* + njf JV* ) (/l+^TMF ~ ') 5 (A ' 20) 
qT = QcPcQcT = rf + r c (nr ~ P7)- (A.21) 
It has to be shown that the determinant 



J 



dm 

dpi dgj 
dr m dr m 



1. 



(A.22) 



r c is the gauge field at the eigenvalue crossing: 

r x c t = QctT = rT + TrfVc- (A.23) 

Thus, we can write 



zp zp 



dr 1 ( dr yv dn yu 

On? n zp V zp \dn? h ' k n? 1 c 



We choose rj so that 



Hence 



(A.24) 



€ - TcTfiT + xJT {rT - T y » - T c 7T yl/S ) ■ , (A.25) 



dr 1 / ' dr yv dii yv \ 

— ~ ' —V*" - 4" + St W - r yv - tjTV . (A.26) 



drf 1 ^ = ~d^ r]i = °' (A.27) 



o o ecu 

dTc r - -r % f A 28) 



Since r c is a point on the surface of constant A = 0, one solution to (A.27) is 
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that rji is normal to this surface, i.e. proportional to the vector 



x -" CI 



( dU dW \ 

- «#Wl75 I (1 -l»)-^xflSx,y+»+ ( 1 + l»)-Q^Sx,y-„ I mV))- 



(A.29) 



We also have for any function g of r + nr c (such as r] or d), 



dg 



dg 



(A.30) 



We are now in a position to write down the components of J by differentiating 
(A. 20) and (A. 21). By writing p and q as functions of r c and n, we have 



dpi 



Xfl v 



dq: 



dqT 
drZ 



'dp. 



Xfl * 



dp 



dlTm 



dp: 



X/J, 1 



Xfl\ 



+ r c 



dr v Z 



u tm u xy u fiv ~ 2 It lm 



\dirZ 



1 



V 1 + 



- 1 + TV 



9Pi 



1 + liVe2j ((w e fc 1 ) 2 +(Jv e fc 2 ) 2 ) 2 



-1 + 



c n yu 

orh, 
\ 



+ 



1 I dk 



+ 



k,xa k,yu 



/ 



1 + (^el) P^)2 
! ^ /l _ 



+ (JV* 2 )2)2 



<4 



^ V - ^ / fc,a;/i fc,J/f I k,x/j, k,yu\ 



el Jy e2((Affc i )2 +{A rfe 2)2) 2 



1 



k,yu 
m 



_ddk_\ 



r c 



1 + 



dfc 

(iV*)2 + (JV* 2 )2 



= - T r 



dp: 



dn 



+ r c 



' r\ XLL r\ ; 

oqi , _ op. 



<9r 



m 



7] 



yv 



7T, 



(9r 

Pl '~ Tc drZ' 



+ 

(A.31) 

(A.32) 
(A.33) 
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Therefore, the Jacobian is 



J 



dpi dqj 

d7T k 0n k 

dpi dq l 

dr k dr k 



dpi dqj 



+ r c 



dw k 



dpi dq l _|_ r dpt_ 

dr k 9r k c dr k 



dpi _ T §Pi_ dq±_ I dn_ _ ( dqt_ 

d-w k c dr k div k c d-w k c \dr k 



dpi 
dn k 

dpi 
9r k 



dr k c dr k 



dpi 



Tc 8r k ) 



(A.34) 



So J is the product of the two determinants 



dpi 



dpi 



8-Kr, 



dr r 



1 + 



4d, 



(A.35) 



and 



dqi dpi 
h r c - — 



dr 



dr r , 



'1 + 



4di 



{7Tn P f]n P ) 2 



(A.36) 



The two determinants (equations (A.35) and (A.36)) can easily be calculated 
by using the relations |<% + NfM ab N b j\ = \5 ab + M a6 | (where iVfiVf = 5 ab and 
A^ a are real) and \5ij + ViUj\ = 1 + ViUi, and by recalling that rf^ and n{'^ 
are normalised to 2. Note that equation (A.36) would not hold if we included 
terms in the fermionic force parallel to rj. Multiplying the two determinants 
together gives J = 1. Therefore, this procedure is area conserving. 



A.2.2 Reflection 



The proof that the reflection algorithm is area conserving follows the same 
line as that for transmission: instead of equations (A.35) and (A.36), we get 



dpi dpi 



dix m 



dr r 



+ r ( 



dr m 
dpi 



dr r , 



- 1 



(A.37) 
(A.38) 



which gives J = — 1. 



A. 2. 3 The combined update step 

There is one further requirement for area conservation to be satisfied. It is 
not sufficient for the standard update and the corrected update to be area 
conserving separately; the combination of them which we use in our molecular 
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dynamics procedure must also conserve the measure. We define Fp and 
Fft' u as the momentum updates for transmission and reflection respectively. 
We write the transmission condition as x(U~ , Uc) > 0, so we transmit when 
this condition holds and reflect when x < 0. The question remains what we 
should do at x = 0, and that is what we shall address in this section, x is a 
function of the initial momentum and the gauge field at the crossing (which 
is itself a function of the initial momentum and the initial gauge field, as 
discussed above). We can therefore write the momentum and gauge updates 
combining the reflection and transmission updates as 



Pt =F*e (x) + (i - e (x)) Fg, 

q t =F^9 (x) + (1 - 9 (x)) Fg. 



(A.39) 
(A.40) 



9(x) is the step function, i.e. 9 = 1 for x > and 9{x) = for x < 0. We 
can now differentiate these equations to get the Jacobian (using the same 
determinant manipulation as before in an attempt to bring one of the sub- 
determinants to zero): 



J 



dpi 
d-Kj 


dqi . 




hi 


hi 


dpi 
drj 


dqi 
drj 




hi 


J 22 



J 



11 




(A.41) 



dFg 



dr4 



+ 



- Fg) 



(A.42) 



J 



12 



--9{x) 



'd_F% 



Tc— V T c 



2dF% 



dr. 



[i-e(z))( dF % 



dTTj 




9(x)(l - 9(x)) \ 



_ 0i% | ^dF% | r dFg 
drj c drj dr^ 



dx 



dx dr c dx dr c 



dr c diTj dr c dr 



■Tc 



- T 



+ 



2dF%; 
dr^ 



+ 



dx dr c 
dr c drj 



dx " 



21 



r) P n Ft F n 

f)(x)^ + (l-9(x))^ + S(x) (F*-F* 



dr-j 



dx dr c 
dr c drj 



(A.43) 
(A.44) 
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HF?, - Fg) + r c (20( x ) - 1) - F%)Jj . (A.45) 



The [29 (x) — l]-terms are caused by the different updates of the gauge field 
in the algorithms for reflection and transmission. In one case, we need to add 
the first column to the second column in the determinant calculation; in the 
other case we need to subtract. These terms, when they are multiplied by the 
step functions in p and q, additionally lead to the factors of [9(x)(l — 9(x))]. 
[9(x)(l — 0(:c))]-terms will not contribute to the Jacobian unless multiplied 
by a (^-function (even though they will in practice be multiplied by factors 
containing \j \fx): they are finite at x = and zero everywhere else: they will 
not contribute to an integral (we define D(x) as a function which is 1 at x = 
and zero elsewhere. This can be seen simply by noting that the integration 
over these terms (either in terms of the gauge fields or the momentum) is zero. 
Alternatively, this can be shown by an explicit calculation using a particular 
momentum update. For example, using the algorithm presented in [26], which 
has di = Ad and x = l+4d/ (rj, IT") 2 (and a slightly different reflection routine) 
one can demonstrate that the total Jacobian, including these terms, is 



J = 1 + (1 - xf^ " w ~ i ; T ^ w = 1 - 25{x). (A.46) 




The rest of the determinant can be simplified by using the results of the 
previous two sections, particularly the result r c dr c /dri = dr c /dxi. The non-5- 
function terms in J i2 cancel immediately as before. Because the terms multi- 
plying the (^-functions are the outer product of two matrices, we can remove 
all but one of these functions by determinant manipulation. For example, by 
subtracting dx/d7TjI5)(x)/(dx/dr c dr c /dr multiplied by Jn and Jyi from the 
J 2 i and J 2 2 allows us to remove the (^-functions in those terms, and a similar 
manipulation allows the removal of the 5-function in J 12 , so that the only 
5-function which remains is contained within J n . We can use a Schur decom- 
position to factorise the determinant in a similar manner to before 

J 22 |. (A.47) 

| J 22 1 contains no 5-function. Because the Schur compliment is proportional to 
D and contains no 5-functions, it will not contribute when we integrate over 
it. But the single (^-function, multiplied by dx/dnj, in J n remains. We can use 
a stopping criterion with dx/diij <^ 1 (setting dx/diij = renders the above 
argument invalid, but we can safely set it arbitrarily close to zero then take a 



J 
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limit) will remove the effect of the ^-function in a numerical simulation; and 
we use this result to justify our numerical tests in section 5.4. But in a real 
life simulation we cannot use a fixed x, and we have to determine whether 
the ^-function in the Jacobian will affect the final results. Except at precisely 
x — 0, the Jacobian is 1. 

A. 2. 4 Discussion of impact of the S-function Jacobian 

We now turn our attention to the question of whether the (^-function of equa- 
tion (A. 47) will affect the equilibrium ensemble of the Monte Carlo. 

The updating procedure needs to satisfy 

W C [U'\ = J d[U]P[U' <- U]W C [U], (A.48) 

where the probability of moving from a gauge field U to the gauge field U' is 
P[U' <— U], and W c is the canonical distribution for the gauge fields. This is 
satisfied by the detailed balance condition 

J d[$]d[&]P[U' <- U, §}W C [U, $] = J d[$\d[&]P[U <- U', $]W C [U', $]. 

(A.49) 

To prove the detailed balance condition for the HMC [81], we define 

P n [n] =e~^ 2 
W C [U,$] =e"* t ^I*e- 59[l/] 
P A ([W,^,U'] <- [n, $,[/]) =min(l,e- £; P I ''*'^ +£ P I '*> I/ l) 
P n W c [[/] =e -s(tn,*,m) 

p MD ([n', $, c/'] <- [n, $, c/]) =5([n', $, c/'] - t md [u, $, c/]), (A.50) 

where T MD is the molecular dynamics trajectory, and 
P[U' *-U,$]= J d[U]d[U']Pn[Il] 

p A ([W, $, c/'] <- [n, $, c/])PMD([n', $, u'\ <- [n, 0, c/]). 

(A.51) 

If the Jacobian (which is introduced when the variables are changed in the 
Pmd term) is of the form J = 1 + J (11)6(11 — II C ), then when we perform the 
integration over momentum, the detailed balance condition given in equation 
(A.49) will pick up an additional term when integrating over momentum fields. 

It is tempting to argue that because the condition x = will never be encoun- 
tered in a numerical simulation we can neglect this Jacobian while implement- 
ing the algorithm; we can, for example, place the logarithm of the Jacobian in 



49 



the Metropolis step, and say that we will account for it precisely when x — 0, 
which will never happen in a numerical simulation. However, arguments based 
around saying that the condition will never be encountered are in effect saying 
that the measure (in terms of the integration over the initial momentum field) 
is zero; which is not enough to counter the effect of a Dirac 5-function. There- 
fore another argument is needed to demonstrate that this will not contribute 
to the final ensemble. One can also argue the Jacobian will not contribute 
because we can use an approximate overlap operator, which we will gradually 
transform to the sign function. The algorithm with the approximate overlap 
operator will resemble the transmission/reflection algorithm, and it is clearly 
correct. However, this argument fails because as the approximation of the sign 
function improves so the molecular dynamics time step must decrease to cor- 
rectly resolve the step. We, however, are working at a far larger time step than 
which is necessary to resolve the sign function: the two cases are different. If 
algorithm works at infinitesimal time-step it does not necessarily follow that 
it is valid at a finite time step. Therefore, we must find another way of proving 
that this 5-function does not contribute, or, if it does contribute, how we can 
compensate for it. 

We proceed by explicitly performing the momentum integral in the detailed 
balance equation over the Dirac 5-functions from the Jacobian; and we shall 
show that the contribution from the (^-functions cancel out, meaning that 
detailed balance is maintained. This can be done easily because there are as 
many 5— functions as there as momentum integration variables. We begin by 
introducing auxiliary momentum fields Hi (i — 1 . . . Nu), which correspond to 
the momentum fields at each kernel eigenvalue crossing during the trajectory. 
The gauge field at the crossing is C/j, leading to a reflection condition depending 
on a variable X{. These additional momentum fields are, of course, precisely 
determined from either the initial momenta using the trajectory Tj[n, $, U, ], 
or from the final momenta and the reverse trajectory T^ l \UI ', $, U']; and these 
conditions lead to a delta-function in the integral for each auxiliary momentum 
field introduced. Since the various x^s are not degenerate with respect to the 
initial gauge field and momentum, we can write n(l + S( x i)) = 1 + J2$( x i)- 
Using this notation, the detailed balance equation reads 

J d&d$P[U' <- U,$]W C [U,$] = J dU'dUd^d^UdUiPA 

( N n 

8([W, $, U'\ - T MD [U, $, U}) 1] S([Uj, $, Uj] - T t [U, $, U})+ 

\ j=l 

N„ i N u 

Y / ^)J(x l )l[5([u t ,<i>,u t ]-T t [u,^,u]) n san^^-TT^n', $,[/']) 

i=l j=l j=i+l 

(A.52) 

J(xi) is the coefficient of the 5— function in the Jacobian. Using the notation 
of equation (46), we can convert the delta function from a function of x to 
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a function of (Ilj,^) by multiplying by dx i d m. y and summing over each 
possible solution for (Ilj,^) which gives x — 0. Since x is quadratic in each 
(Hj,?7fc), which corresponds to one component of the momentum field, there 
will be two possible solutions to the equation x = for each of the N k mo- 
mentum field components; and we will generate a 5— function for each of these 
solutions. Using x — 1 + Ad/(Y,k(Vk, I^) 2 , it follows that 



r / 



5{ Xl )J{ Xl ) =2]T 

k L 



[Ui, Vk) - 



V 



(IT, 7] k )r] k 



( 



V 



\ 



Ad+ J2(^Vm) 2 

m^k 



En(nj,?7 n ) 2 

'Ilj, r) k )r) k 



J(Xi). 

(A.53) 



It is easily observed that the coefficient of the 5— function for each k is an 
odd function of (Hi,r)k) multiplied by J( X i). In particular, if J( X i) is an even 
function of (IL,,?7 n ), then the coefficients for the positive (n^, 77) and negative 
(n i5 77) (^-functions will precisely cancel, removing all the 5— function terms 
from the detailed balance equation. So the task of demonstrating that this 
5— function will not contribute to the final ensemble can be reduced to a 
demonstration that J(x) is an even function of the momenta. 

From equation (A.42), we see that J(x) = dx/d^rj^iF^ - Fj£). The Jaco- 
bian is the determinant of the N k x N k square matrix 



(r h ,F^-Fg)dx/d(Tl,r, j )5( 



x 



(A.54) 



Noting that (rji, F^) = at x = 0, this matrix is the outer product of two 
vectors, both of which are odd functions of (r/j, Hi). Therefore, J(x«) is an even 
function of (rjj, Hi). As previously argued, this means that the presence of the 
5— function will not affect the final ensemble. We demonstrate that J(xi) is 
even for an explicit example below. 

It might be thought that this argument might be invalid because we cannot 
sample both positive and negative II with equal probability. If the trajectory 
starts in one particular topological sector, then it can only approach the topo- 
logical wall from one direction (unless there is more than topological index 
change in the trajectory). However, our calculation does not depend on the 
sign of II but the sign of (11,7/). The algorithms presented in this article are 
all even functions of 7/; in practice, the sign of 7/ will be chosen randomly. In 
the detailed balance equation, since the both the molecular dynamics trajec- 
tory, T, and the metropolis acceptance probability, Pa, are independent of the 
sign of 77, both choices for 77 lead to the same final gauge field, and we must 
include this choice of sign by summing over the two possibilities, weighting 
them by the appropriate probability; if the algorithm is designed without bias, 
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this probability will be 1/2. In other words, it is not the integration over the 
auxiliary momentum fields but this sum over the signs of 77 which causes the 
5— functions to cancel. Thus the direction in which the trajectory approaches 
the topological sector wall, and the limited sampling of the momentum fields 
are both unimportant. 

As an example, we now explicitly calculate the Jacobian for one particular 
case to demonstrate how this works in practice. We assume that there is just 
one momentum crossing in the trajectory, and use the update 



F£=7rr+77i(II-,77) ^ 



'1 + 



Ad 



-1 - 



(IT", 77)2 , 
r c ((F~ + F + )-r ]l (r ] ,F~ + F + )) 
Fg =tt- - 2 Vi (r), rr) - 2t c (F~ - 77,(77, F~)) 
FV=q- + T c (F*-n7) 

F& =q~ + r c (F* + n-) 
4d 



x =1 + 



( dx \ 
( pn P ni 



;n-,77) 2 

277 



x=0 
x=0 

J(x) 



(n-,r7) 
(n- 17) 

/ dx 



{^Ti Ri/ 



x=0 



x=0 



-2. 



(A.55) 



Hence J is an even function of (11,77). The additional and unwanted term in 
the detailed balance equation, once we have integrated out all the auxiliary 
momentum integration variables except (n^ , 77) , reads 



-2 / d&d$J2 /( n ^) 



<$((ni,7fc)- V^4d)- 



+ 5((n i ,r H ) + 

; (n,,77,) 



Pa = 
(A.56) 



Hence the detailed balance condition is, in fact, maintained despite the 5- 
functions in the Jacobian. 



Numerically, in section 5.4, we have compared the algorithm with an approach 
that explicitly does satisfy detailed balance (reflecting when \d\ > d max and 
using n + = n - when \d\ < d max ), but does not necessarily conserve energy. In 
an extensive simulation we found no difference in the observables calculated. 
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A. 3 Correction step error of 0{r 2 ) 



Suppose that there is a crossing between time r = (where the gauge field, 
momentum, and fermionic force are U , IT respectively), and time r = At 
(with gauge field Ui, momentum IT). The gauge fields immediately before 
and after the eigenvalue crossing are Ud and U u . 17 If S(U) = E — |n 2 is the 
action, then the total force F' acting on the momentum is 

We shall demonstrate that the energy difference, AE, between time r = At 
and time r = is of order At 2 . The energy change between time At and time 
is 

ae = 1(11!, no - l(n , n ) + s^) - s(u ). (A.58) 

In this notation, our update for transmission (50), with s 2 = 1), and neglecting 
terms of 0(At 2 ) or higher is 

At 

n- =n - — F o; 

n + =n- - r c (1 - |J7) fal) (F- - F+) + t c ^Tt(F- - F+) + 



(vM, H- - |(F- + F+)) + r^ 2 , n- - |(F" + F+))) x 



r c (F-^)(n-^)-r e (F+ iy)(n+ iy) 
^ * (^ 2 , n- - f (F- + F+)) 2 + ( % 2 , n- - * (F- + 



n x =n+ - — Fi 

2 

U+ =e -ircTl+ e +ir c U- JJ 

,AT -n+ rr + 



^ = e ^r^f/+ (A.59) 
U d =e<^ c )n- Uo 

U x = e <^- Tc ) u+ U u . (A.60) 



17 Note that in practice U u = Ud = U c - there is no discontinuity in the gauge field 
at the crossing. The notation here is purely for clarification. 
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There is a jump in the action S at the eigenvalue crossing, 

S(U U ) - S(U d ) = -2d. (A.61) 
Using (11+ - n_, F + - F_) = 0(Ar 2 ), we can show that 

(n + , n+) - (n-, rr) =4d + 2r c (n-, f + ) - 2r c (n- f~)+ 

r c (n + -n_,F + + F_) + 0(Ar 2 ) 

=4d + 2r c (ir , F + ) - 2r c (n _ , F~) + 0(Ar 2 ), (A.62) 

(nx, no - (n , n ) = 4d-Ar((n , f ) + (it, f,)) 

- 2r c ((F , n~) - (Fx, 11+)) + 0(Ar 2 ), (A.63) 



S(tfi) - S(U ) =S (U (r+ + - r c ) p 



At 



/Ar \ 95 



Pi + 



At \ dS_ 

A' 



TTi + CKAr 2 ) 



= -2rf + ^((n + ,F 1 ) + (n-,F )) + 

r c ((n-,F-)-(n + ,F + )) + 0(Ar 2 ). 



So, 



A£ = 0(Ar 2 ). 



(A.64) 



(A.65) 



Therefore, our modified correction update for transmission conserves energy 
up to 0(At 2 ). It can also be shown that the energy violating reflection term 
is also 0(Ar 2 ). Without the additional correction term to account for the 
O(At) error, the energy violation would be 

T i ((f~ - f + , v ) (n- + n + , 77) - (f- + f + , v ) (ir - n+, v j) + o(At). 

(A.66) 



B Two Crossings within the same Molecular Dynamics Step 



There are several situations when it is difficult to resolve whether or not there 
has been an eigenvalue crossing. Firstly, when an eigenvalue has a minimum 
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close to zero, and secondly when there are two small eigenvalues. In the first 
case, one can identify the minimum by looking at the sign of the differential of 
the eigenvalue with respect to computer time. If the eigenvalue is sufficiently 
small that there is a danger that the crossing may have occurred, one can 
search for the zero eigenvalue. In the second case one has to search for crossings 
if there is a large mixing between the two eigenvectors, or an initial Newton- 
Raphson extrapolation indicates that they might cross. In both cases, one 
moves to the point of the first crossing, changes the momentum in the usual 
manner, moves to the second crossing (if necessary, i.e. the first crossing was 
transmission), and changes the momentum again. Care needs to be taken to 
ensure that the algorithm will detect the potential crossing in both directions, 
otherwise there could be a breakdown in reversibility. 



More serious is the possibility of mixing between a small positive and small 
negative eigenvalues of the kernel operator. As an example, let us consider a 
system with just two eigenvectors and \1jj2) ■ One can calculate the mixing 
between the eigenvectors using a similar method to the one used in section 
2.2, except in this case we will not expand in small At. One starts as before 
with the eigenvalue equations at times r and r + At: 



1^2) + \s 2 ) 



= cos0|^i) +e^sin0|V> 2 ) 
= cos #^2) +e-^sin0|V>i) 



(B.l) 



One can now easily solve for the mixing angles 9 and 0: 



e ^ = 



\l (M8Q\1>i) 

(A 2 - Ai + (HWM 



2 



tan 2 9 =A 




(B.2) 



An expansion in small St (with SQ = StOQ/Ot + 0(St 2 )) gives the same 
expression that was derived in section 2.2. However, this expansion is only 
valid if 



and 



A2 — Ai 



< 1 



< 1. 
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These conditions fail if two eigenvalues are close to each other and both have 
maxima or minima in the same time step. One can write the fermionic force 
contribution from the small eigenvectors (excluding the delta function term) 
as 

2 hm ^sin^cos0(X| {75, + \MM |X)(e(A 1 ) - e(A 2 )). 

If there occurs a small negative eigenvalue and a small positive eigenvalue — 
mixing between eigenvectors with the same sign does not contribute to the 
force — and they have minima at the same time step, there could potentially 
emerge a very large fermionic force. 

In our simulations we have indeed observed this effect. Most of the metropolis 
rejections on larger lattices can be traced back to it. In a subsequent paper 
we are going to describe a solution to this problem [55]. 
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